{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "7a3a1515-653b-4a95-b6c1-4a21187dea78",
   "metadata": {},
   "source": [
    "# Verifying Optimized Structures\n",
    "## Objective\n",
    "\n",
    "The objective of this tutorial is to check if an optimized truncated model is consistent with the greater protein environment. This is useful for determining if the geometry of the optimized truncated model is feasible if the surrounding protein were present.\n",
    "\n",
    "This workflow allows you to:\n",
    "\n",
    "- Compare the optimized structure with the original structure file.\n",
    "- Verify if the optimized structure is probable in the original structure file.\n",
    "\n",
    "It is recommended to first visit the QM-only calculation workflow since the first half of the portion stems from it. You can find it [here](https://qmzyme.readthedocs.io/en/latest/Examples/QM-only%20Calculation.html).\n",
    "\n",
    "In this specific example, we are using ketosteroid isomerase (KSI) as the model system. The structure for KSI is obtained from the PDB [1OH0](https://doi.org/10.2210/pdb1OH0/pdb). In this PDB, KSI is bound to equilenin (EQU), which is an analogue of the dienolate reaction intermediate. Quite often, it is seen that resolved crystal structures contain non-reactive analogues; however, most QM studies aim to study the reaction path. Since our purpose is to examine the activity of KSI, we need to replace EQU with the intermediate analogue, 19NT, then reoptimize with our substrate, 5AND. Since EQU and 19NT are structural analogues, we can skip a complicated docking procedure and use QMzyme to align and replace EQU with 19NT, providing our reactive starting structure. When undergoing geometry optimizations, it is possible that the optimized structure might be in an orientation that clashes with the initial structure model. To resolve this, we need to identify if our optimized KSI structure with 19NT bound does not clash with the initial crystal structure.\n",
    "\n",
    "In this tutorial, we will use a distance cutoff of 3 Å to undergo geometry optimization. Then, we will load our full environment from a model_cutoff_3.pkl file, import the optimized geometry output file, align them using C-alpha atoms, and examine if the resulting optimized structure is feasible.\n",
    "\n",
    "## Classes used in this example\n",
    "\n",
    "- [Generate Model](https://qmzyme.readthedocs.io/en/latest/API/QMzyme.GenerateModel.html)\n",
    "- [QM_method](https://qmzyme.readthedocs.io/en/latest/API/QMzyme.CalculateModel.html#qm-treatment)\n",
    "- [SelectionSchemes](https://qmzyme.readthedocs.io/en/latest/API/QMzyme.SelectionSchemes.html#)\n",
    "- [DistanceCutoff SelectionSchemes](https://qmzyme.readthedocs.io/en/latest/API/QMzyme.SelectionSchemes.html#QMzyme.SelectionSchemes.DistanceCutoff) (optional)\n",
    "- [QMzymeRegion](https://qmzyme.readthedocs.io/en/latest/API/QMzyme.QMzymeRegion.html)\n",
    "    - [align_to method](https://qmzyme.readthedocs.io/en/latest/API/QMzyme.QMzymeRegion.html#QMzyme.QMzymeRegion.QMzymeRegion.align_to) \n",
    "- [CA_terminal TruncationSheme](https://qmzyme.readthedocs.io/en/latest/API/QMzyme.TruncationSchemes.html#QMzyme.TruncationSchemes.CA_terminal)\n",
    "\n",
    "## Required Files\n",
    "\n",
    "To start, you will need:\n",
    "\n",
    "- A fully prepped and protonated PDB.\n",
    "- An output file from geometry optimization.\n",
    "  \n",
    "---"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "id": "6db80fb0-8021-424e-a3ea-44956f40e915",
   "metadata": {},
   "outputs": [],
   "source": [
    "# Here are the necesary imports for this tutorial!\n",
    "\n",
    "import QMzyme\n",
    "from QMzyme import GenerateModel\n",
    "from QMzyme.SelectionSchemes import DistanceCutoff\n",
    "from QMzyme.data import PDB_xtal, LIG, OPT_C3\n",
    "from QMzyme.RegionBuilder import RegionBuilder\n",
    "import numpy as np\n",
    "import QMzyme.MDAnalysisWrapper as MDAwrapper\n",
    "from MDAnalysis.lib.distances import distance_array\n",
    "import MDAnalysis"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "66b69ce3-880c-4fef-8c63-5f6bd0fcf3d9",
   "metadata": {},
   "source": [
    "## Preparing QM Input File for Optimization\n",
    "\n",
    "Let's start by creating an input file for optimization of 1oh0. For this, we will use the [Molecule Replacement Using QMzyme](https://qmzyme.readthedocs.io/en/latest/Examples/Molecule%20Replacement%20Using%20QMzyme.html) workflow to first create the optimization input file! Specifically, we will be using a distance cutoff of 3 Å with our ligand replaced from EQU to 19NT. This section exists to demonstrate the process of making QM-input and .pkl files for the next section."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "671ba96f-4803-4659-8a06-18808bbff678",
   "metadata": {
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "model = QMzyme.GenerateModel(PDB_xtal)\n",
    "QMzyme.data.residue_charges.update({'EQU': -1}) # EQU ligand\n",
    "model.set_catalytic_center(selection='resname EQU and segid A')\n",
    "model.set_region(selection='resname EQU and segid A', name='old_ligand')\n",
    "model.set_region(selection=DistanceCutoff, cutoff=3)\n",
    "\n",
    "region_19nt = model.import_region(LIG, name='19nt')\n",
    "QMzyme.data.residue_charges.update({'6VW': 0})\n",
    "original_resid = model.old_ligand.atoms[0].resid\n",
    "region_19nt.atom_group.residues.resids = original_resid\n",
    "for atom in region_19nt.atoms:\n",
    "    atom.resid = original_resid\n",
    "    \n",
    "region_19nt.align_to(other=model.old_ligand,\n",
    "    self_selection=\"element C\", other_selection=\"element C\")\n",
    "\n",
    "new_region = model.get_region(\"cutoff_3\")\n",
    "new_region = new_region.subtract(model.old_ligand)\n",
    "new_region = new_region.combine(region_19nt)\n",
    "model.set_region(selection=new_region, name=f\"ksi_19nt_cutoff3\")\n",
    "\n",
    "ca_atoms = model.ksi_19nt_cutoff3.get_atoms(attribute=\"name\", value=\"CA\")\n",
    "model.ksi_19nt_cutoff3.set_fixed_atoms(atoms=ca_atoms)\n",
    "\n",
    "qm_method = QMzyme.QM_Method(\n",
    "    basis_set='6-31G*',\n",
    "    functional='wB97X-D3',\n",
    "    qm_input='OPT FREQ',\n",
    "    program='gaussian'\n",
    ")\n",
    "qm_method.assign_to_region(region=model.ksi_19nt_cutoff3)\n",
    "\n",
    "model.truncate()\n",
    "model.write_input()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "67158f56-b173-4128-8a4d-d9aea6dff170",
   "metadata": {},
   "source": [
    "## Loading Pickled QMzymeModel\n",
    "\n",
    "After running the geometry optimization, it is always important to validate your resulting structures! To do this, we will load our original full-system using the `.pkl` file from input file generation. Because quantum chemistry software can sometimes alter atom naming conventions or strip residue metadata, we run a quick loop to map the original names, IDs, and residue labels from our reference model directly onto our optimized atoms. This ensures everything matches perfectly before we align the structures."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "id": "cb2adb8b-f489-4866-8122-bf92adaa04b8",
   "metadata": {
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "model = QMzyme.GenerateModel(pickle_file='1oh0_equ_crystal.pkl')\n",
    "universe_opt = MDAnalysis.Universe(OPT_C3)\n",
    "region_opt = RegionBuilder(name='opt', atom_group=universe_opt.select_atoms(\"all\")).get_region()\n",
    "ref_region = model.get_region(\"ksi_19nt_cutoff3_truncated\")\n",
    "\n",
    "for opt_atom, ref_atom in zip(region_opt.atoms, ref_region.atoms):\n",
    "    opt_atom.name = ref_atom.name\n",
    "    opt_atom.resid = ref_atom.resid\n",
    "    opt_atom.resname = ref_atom.resname"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "3f139f6c-0013-4dbf-a67e-28147d9f28c4",
   "metadata": {},
   "source": [
    "## Aligning C-alpha Atoms of Original and Optimized Structuresd\n",
    "\n",
    "The `align_to()` method is present in the QMzymeRegion class. It can be used to align specific atoms and then update the aligned atoms to the region.The alignment procedure also reports RMSD values before and after alignment.\n",
    "\n",
    "⚠ **Important considerations:**\n",
    "\n",
    "- Align the **substrate (self)** to the **full protein–ligand complex (other)**.\n",
    "- Do **not** align the other way around.\n",
    "- This can be ensured by specifying the substrate model before the \"align_to\" function. This will assign the substrate model to \"self\".\n",
    "  \n",
    "This ensures:\n",
    "- The substrate region is updated to match the aligned coordinates.\n",
    "- The substrate moves into the position of the bound ligand.\n",
    "- The ligand does *not* move to the substrate coordinates.\n",
    "\n",
    "Two critical arguments `self_selection=` and `other_selection=` specify the atoms used for alignment. Make sure that the number of atoms in the reference (self) and in the target (other) are the same. If the number of atoms in the selections differs, then the alignment will fail.\n",
    "\n",
    "If your substrate PDB was generated independently (not modified directly from the bound ligand), ensure that corresponding heavy atoms are properly labeled. This will ensure that the correct atoms are getting aligned. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "id": "b611aad9-b8cf-494b-bac2-cd56d6d8f2c7",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "RMSD before alignment: 72.48750305175781 \\u00C5\n",
      "RMSD after alignment: 4.2482093698830376e-05 \\u00C5\n"
     ]
    }
   ],
   "source": [
    "# Now that region_opt knows which atoms are \"CA\", you can align them.\n",
    "# Note: The standard MDAnalysis selection string for alpha carbons is \"name CA\"\n",
    "region_opt.align_to(\n",
    "    other=ref_region,\n",
    "    self_selection=\"name CA\", \n",
    "    other_selection=\"name CA\"\n",
    ")\n",
    "\n",
    "model.set_region(selection=region_opt, name=\"OPT_aligned\")\n",
    "model.set_region(selection=\"all\", name=\"full_region\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "ce74e8e5-a27e-4d6c-812b-ec398b83d02e",
   "metadata": {},
   "source": [
    "## Identifying Potential Clasing Residues\n",
    "\n",
    "Now that our optimized geometry is aligned with the full system, we can examine if the optimized structure is feasible with the initial structure! To do this, we use the `find_nearby_residues()` method to compare our aligned, optimized region against the surrounding protein environment. The method calculates the distance between every atom in our optimized model (self) and every atom in the rest of the protein (other), skipping over identical residues. Any residue from the greater environment that falls within our 1.0 Å threshold will be flagged and printed out, showing you exactly which parts of the protein matrix would experience a severe steric clash due to the geometry optimization."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "id": "3ea6655e-554f-40ce-9dff-5cf255d59073",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "<QMzymeResidue resname: ILE, resid: 28, chain: X> is within 1.0 Å of <QMzymeResidue resname: TYR, resid: 16, chain: A>\n",
      "<QMzymeResidue resname: ILE, resid: 102, chain: X> is within 1.0 Å of <QMzymeResidue resname: VAL, resid: 101, chain: A>\n",
      "<QMzymeResidue resname: GLN, resid: 117, chain: X> is within 1.0 Å of <QMzymeResidue resname: MET, resid: 116, chain: A>\n"
     ]
    },
    {
     "data": {
      "text/plain": [
       "[<QMzymeResidue resname: ILE, resid: 28, chain: X>,\n",
       " <QMzymeResidue resname: ILE, resid: 102, chain: X>,\n",
       " <QMzymeResidue resname: GLN, resid: 117, chain: X>]"
      ]
     },
     "execution_count": 7,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# Call the method to find any residues in the full region\n",
    "# that is within 1.0 Å from aligned optimized model.\n",
    "model.OPT_aligned.find_nearby_residues(model.full_region, 1.0)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "518c3f6b-4347-4c11-86f7-4e6b87d389ef",
   "metadata": {},
   "source": [
    "## Visualizing Clashing Residues\n",
    "\n",
    "From this, we acquire three potential clashing pairs of residues. However, we cannot assume that residues are clashing from this information only. Thus, to examine how close they are, we will use `pymol_visualize()` to see how close these residues are. This is done by visualizing the overlay of `OPT_aligned` region with `clashing_residues` region."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "de4c7dd7-426e-4d24-b3af-2c301e6bb319",
   "metadata": {},
   "outputs": [],
   "source": [
    "# First create a region consisting of the clashing residues.\n",
    "model.set_region(selection=\"resid 28 or resid 102 or resid 117\", name=\"clashing_residues\")\n",
    "\n",
    "# Create pymol visualization.\n",
    "model.pymol_visualize()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "5d64607b-dcf0-4aca-933b-0a02d71bc907",
   "metadata": {},
   "source": [
    "<img src=\"../Images/verify_structure.png\" width=\"600\">\n",
    "\n",
    "When you open the combined clashing_region in PyMOL, the spatial overlap between the full protein environment and your optimized model becomes visually striking. The bulky aromatic ring of TYR 16 has rotated or shifted during the quantum chemical optimization directly into the space occupied by the side chain of ILE 28. From this, we can conclude that the optimized structure is not structurally possible within the original protein structure."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "204b8a74-862a-4c52-aa3f-01f4f07c5bf6",
   "metadata": {},
   "source": [
    "## References Cited\n",
    "\n",
    "The PyMOL Molecular Graphics System, Version 3.0 Schrödinger, LLC."
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.11.0"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}